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Abstract 

CANDECOMP/PARAFAC (CPD) approximates multiway data by sum of rank-1 tensors. Our recent 
study has presented a method to rank-1 tensor deflation, i.e. sequential extraction of the rank-1 compo¬ 
nents. In this paper, we extend the method to block deflation problem. When at least two factor matrices 
have full column rank, one can extract two rank-1 tensors simultaneously, and rank of the data tensor 
is reduced by 2. For decomposition of order-3 tensors of size RxRxR and rank-A 1 , the block deflation 
has a complexity of 0(R 3 ) per iteration which is lower than the cost 0(R 4 ) of the ALS algorithm for the 
overall CPD. 


Index Terms 

canonical polyadic decomposition (CPD), CANDECOMP/PARAFAC, tensor deflation 

I. Introduction 

An important property in matrix factorisations like eigenvalue decomposition or singular value de¬ 
composition, is that rank-1 matrix components can be sequentially estimated via deflation method, such 
as the power iteration method. The matrix deflation procedure is possible because subtracting the best 
rank-1 term from a matrix reduces the matrix rank. Unfortunately, this sequential extraction procedure 
in general is not applicable to decompose a rank-/? tensor |[Q. 
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In our recent study 0, 0, we have introduced a tensor decomposition which is able to extract a rank-1 
tensor from a high rank tensor. The method is based on the rank-1 plus multilinear-(A - 1,/? - I. A' - 1) 
block tensor decomposition, but with a smaller number of parameters, only two vectors per modes. This 
paper extends the rank-1 tensor extraction to block tensor deflation or rank splitting which splits a high 
rank-A tensor into two tensors with smaller ranks. In particular, we develop an alternating subspace update 
(ASU) algorithm to extract a multilinear rank-(2,2,2) tensor from a rank-A tensor. Since decomposition 
of a 2 x 2 x 2 tensor can be found in closed-form, we can straightforwardly obtain the desired rank-1 
components. The proposed algorithm estimates only 4 vectors and two scalars per dimension with a 
computational complexity of 0( A 3 ). Moreover, it also requires a lower space cost than algorithms for the 
ordinary CANDECOMP/PARAFAC (CPD). 

The paper is organised as follows. A tensor decomposition for block tensor deflation or rank splitting 
is presented in Section |IT] The proposed algorithm is presented in Section UTI] Simulations in Section |TV] 
will verify validity and performance of the proposed algorithm. Section [V] concludes the paper. 

II. Preliminaries 

Throughout the paper, we shall denote tensors by bold calligraphic letters, e.g., A e !/' x/2X "'XAq 
matrices by bold capital letters, e.g., A =[a\, a ^,..., Ur] e WiJ xK , and vectors by bold italic letters, 
e.g., Uj. The Kronecker product is denoted by ®. Inner product of two tensors is denoted by (X, y) = 
vec(X) T vec(y). Contraction between two tensors along modes-m, where m = \m \,..., m K \, is denoted 
by (X,y) m , whereas (X, 11)-,, represents contraction along all modes but mode- 77 . Generally, we adopt 
notation used in Il4l . 

The mode-/? matricization of tensor y is denoted by Y ( „,. The mode-/? multiplication of a tensor 
y e g/ixAx-x/ ;V ^ a ma trix U 6 R i,,xR is denoted by % = y x„ U e ^Ax-x 4 -ixRx/„ +1 x-x/a,_ p ro ducts 
of a tensor y with a set of N matrices {U w } = jlJ (1) ,U (2 \..., U (A,) } are denoted by y x {U (n) } = 

y xiu tl) x 2 u (2) ---x^uw. 

A tensor X e rAx^x-x /„ j s sa jq j n Km S p a i form if 

X = ^ A r a\ l) o a* 21 o • • • o a \ N) , (1) 

r= 1 

where “o” denotes the outer product, A 00 = \a { "\a "\ ..., \ e mJ" xK are factor matrices, a " >T a," } = 1, 

for r = 1,..., A and n = 1 ,,N, and Ti > A 2 > ■ ■ ■ > Ar > 0. 

A tensor X 6 R.AxAx-xi/v has multilinear rank-(Aj, A 2 ,..., A v ) if rctnk(X (ll) ) = R n < /„ for n = 1,..., N, 
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and can be expressed in the Tucker form as 

R R R 

n=lr 2 =l r » = 1 

where S = |g n r 2 ...r lV h and A (n) are of full column rank. For compact expression, [[T; {A 00 }]] denotes a 
Kruskal tensor, where |[G; {A ( "*}J represents a Tucker tensor. 

The main focus of this paper is a block deflation which splits a rank-A CPD into two sub rank-k' and 
rank-(A - K) CPDs. This tensor decomposition is a particular case of the block tensor decomposition 0 
but with only two blocks of multilinear rank-(X, K, K) and rank-(A - K,R - K,R - K) as illustrated in 
Fig. [Q That is 

y « [S;U (1) ,U (2) ,...,U (JV) J + ETC; V (1) , V a) ,..., V (A °]] + £ (3) 

where U ( '' ) and V l,!) are matrices of size /„ x K and I n x (A - K), respectively. Following this tensor 
decomposition, decomposition of a rank-A tensor can proceed simultaneously through decompositions of 
sub-tensors with smaller ranks. When K = 1, we have the rank-1 tensor deflation discussed in Part-1 |[3j 
and Part-2 |6|. 

For this kind of tensor decomposition and block tensor deflation, we can use the ALS algorithm 0 
or the non-linear least squares (NLS) algorithm [j7] developed for the multilinear rank -(L r ,M r ,N r ) block 
tensor decomposition with two blocks. Flowever, these existing algorithms are expensive due to a large 
number of parameters of the two core tensors S and TT. The proposed algorithm will estimate only four 
vectors of length R per dimension whereas the core tensors S and TC need not to be estimated. 

We will first introduce an orthogonal normalisation for the block tensor deflation, then state the 
correctness of the proposed deflation scheme. 

Lemma 1 (Orthogonal normalization for rank splitting). Given a decomposition of}$ as 31 ~ ES; U (1 \ U (2 \ ..., U ( Vl ]|+ 

EfK; V(P,yl 2 ).V W) J, where U (n) e R , " x ® and V (,l) e WJ" X(R ^ K \ K < R - K, one can construct an 

equivalent decomposition, denoted by tildas, which has the same approximation error, such that 

• ES;{U (n) }] = ES;{U (,,) }]]. ETC;{V (n) }]| = ETC;{V ( ' !) }] 

. U (n) and V ( ' ;) are orthogonal, i.e., (U (,2) ) r U (,,) = l K and (V ( " ) ) r V (,,) = l R . K . 

• and obey conditions V ( ' !) = [diagjrr,,}, 0 R _ 2 /f] where cr n = [cr„ 1? ..., cr n K \ e R.^ and 0 < cr n r < 

1. 

Proof: See Appendix |A] ■ 


June 17, 2015 


DRAFT 


4 



Fig. 1. Rank splitting for CP decomposition of a rank-R tensor into two multilinear rank-(R,..., K) and rank-(R- K,.. ,,R-K) 
tensors S and IK. 


Theorem 1 (Rank splitting). A rank-R tensor = [[/?; {) [| has an exact decomposition as in @ 

y = IS; u (1) , ..., U tA,) ]| + ITC; V (1) ,..., V w ] 

where U (,i) £ W" xK and V (n| £ R f x ( R - K \ K < R - K and 

• at least two factor matrices IV"- 1 £ M. I,,xR are offidl column rank, 

• 9 has multilinear rank-(K ,..., K). 

Then 9 is a tensor of rank-K and 3T of rank (R — K). 

Proof: See Appendix |B] ■ 

III. Alternating Sub space Update Algorithm 

In this section, we consider order-3 tensors of size RxRxR. Tensors of larger and unequal sizes should 
be compressed to this size using the Tucker decomposition ll8ll— ifTOl . We will develop an algorithm for 
the block tensor deflation which reduces the rank by K = 2. For this particular case, the core tensor 9 is 
size of 2 x 2 x 2, and the core tensor !K of size (R - 2) x (R - 2 x (R - 2). The factor matrices U (,i) and 
V ("> are of size fix2 and fix(fi-2), respectively. The rank-2 block deflation has an advantage over the 
rank-1 tensor deflation when factor matrices have two nearly collinear components. 

We denote matrices V (,i) = [r'”’, vf l> \ which comprise the first two columns of V w , and perform 
reparameterization of U (n) as 

U« = w (,,) diag^J + V (n) diag(o-„), (4) 
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where = [g„uf„ 2 ] T , £>„■ = VI - and W ( '° = |w> l | ' i \ of size R x 2. [W ( " ) ,V ( " ) ] are orthonormal 
matrices of size /? x /?, i.e., [W (n) , V (n) ] r [W (n) , V (n) ] = I*. 

Consider the following criterion to be minimized, 

D = l -\\y - S X {U ( " ) } - ‘K x {V w )||| . (5) 

The ALS algorithm © and the non-linear least squares (NLS) algorithm Q consider the same optimi¬ 
sation criteria. We will later simplify the objective function in © by replacing the core tensors by their 
closed-form expressions and applying the above reparameterization. The objective function will finally 
depend only on W tn) , V 00 and er„ for n = 1,2,3. 


A. Closed-form expressions for the core tensors 

The first derivatives of the cost function D in © with respect to the core tensors S and IK are given 


by 


% = x {U (,l)r } + S + IK x {diagfcr„)}, 

oS 


^ = -y x {v ( ” )T } + s x 


+ “K, 


diag<tr„) 

0(R-2)x2 

where !K = !K( 1:2,1:2,1:2). We obtain closed-form expressions for IK and S as 

« = j)x(vW)-9x([ diag(,T " ) 

( 0(R-2)x2 

S = (y x {U (n)r } - (y x {V ( " )r }) © s) 0 (1 - S © S) , 


( 6 ) 

(7) 

( 8 ) 
(9) 


where S = o"i 00 - 200-3 is a rank-1 tensor of size 2x2x2, © and 0 represent the Hadamard (element-wise) 
product and division, respectively. 

We replace ‘K in the cost function © by its closed-form in ©, and rewrite D as 
D = ^||y - y x jv (n) v wr ) - S X {U ( ' !) } + s X {V w diag(rr„)}||| 

= \ (liy - y X {v (n) v (n)7 ') III + IISIll + IIS X {diag(cr„)}||| 

-2<S X {U w }, s X {Vdiag(o- fl ) (n) }> -2<y - y x {v w V (n)r ), S x {U ( ">}>) 

- ^(yyiil-iiyx{v«yw r )ii 2 -<s®(i-s®s),s>). cio) 
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For an index n e {1,2,3}, define n\ and n 2 with n\ < n 2 as its complement in {1,2,3}, i.e., {n,n l ,n 2 } = 
{1,2,3}. Put 


tin) _ -u x H («i ) T v 


= y x «, v 


OnT v v («2)r 

A «2 ”.s 


“»• r l /-,s u n\,r u nz,s • 


The objective function in (flQl) can be expressed as 


D - - 

2 


IIHIlJ - IIS x (v<">v'"’ r ) 6 -XZZ 1 


222 r„(i)r,m _ „<i>r {ii ,23 

V w r, *r 2 ,r 3 y n *>r 2 ,rx) 


r i = l r 2 =l r 3 =l 


2 2 2 
erf erf erf 
l,n 2,r 2 3,r 3 / 


iiyii|-nyx{v ( " ) v wr )i|2 - 


h (fl.r,^r ( M +°'l,n v n )r 4! , n) 2 ' 


n,r2,r3=l 


1 — cr? cr? cr? 

1,0 2,7-2 3,7-3 


( 11 ) 

( 12 ) 

(13) 


(14) 


B. Estimation of cr n 

We begin with deriving update rules for (T\ = [ cr 11 , cr L2 1. As shown in the cost function in (fT4l) . the 
parameters cr, involve only the third term. In order to estimate <T\, we keep other parameters fixed. Then 
minimization of the cost function (fl4l) leads to maximization of the function of cr\ 


max 

cp 1,1,cr i, 2 


2 2 2 

zzz 

7-1 = 1 7"2 = 1 7-3 = 1 


(tUnK 1 / 1 tYfr, + cr Ul v ( ^ T d^ r f) 2 


1 — cr? cr? cr? 
1,0 2,7-2 3 , 7-3 


(15) 


Each cr l n is found as <Ti = 1/ J 1 + xf^ where x n is solution to the problem 


r - are max V V 
x n — argmax > > 

x , , x 2 + 1 - cri , cri , 

r 2 = l r 3 = 1 Z,r 2 3,r 3 


(16) 


a r . 


= w 


(i)r # (i) 


t\^ r , and /l,,,,r, = vyj >T cl^f. . The optimal x n is a root of a polynomial of degree-8. The other 


cr nr can be estimated similarly. 


C. Estimation of orthogonal components W (,i) and \ <n> 

This section will present update rules which preserve orthogonality constrains on W (,i) and \ <n> . Indeed 
we only need to update \\ <n> and the first two column vectors V (,j) = [v < " ) , whereas the last (R - 4) 
columns [Vj" 1 ,..., v^ 2 ] are chosen as arbitrary orthogonal complement to V (n) ]. 

Since V ( " , V ( " )r = I* - W ( " ) W ( ” ,r , we have 

||y x (y ( '4y(") r j ||| = tr(O n ) - tr(W ( " )r O„W (n) ) (17) 
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where O,, = Y ( „ } \ (n) \ <k)T 'j Yj n) are matrices of size RxR. The cost function in (fl4l) is rewritten 


as 


where 


D - - 
2 


\m 2 F - tr(®„) + 2 w? )T Q n. r w™ - v[ n)T F v v("> - 2 w[ n)T K n r v 


00 


r—1 


k,l 


An) Xn) T 
l kJ 1 k,l 


i 9 2 

err ..cr , <X . 

«2,/ 


= cr 


2 Y — 

kj 


d (n) d (n)T 

u k,i u k,i 


K„ 


k,l 1 


<y ni ,y ni ,i 

Jn) i(n) T 
l k,l U k,l 


t 2 2 

cr- ,.cr , cr , 

n,r m ,k ri2,l 


(18) 


(19) 


( 20 ) 


It follows that W (n) and V (n) arc solutions to the following quadratic optimisation 

2 2 


min/(W ( " , ,V ( " ) ) = 


X W ^ T Q nr "Y - V ( ; )r F n,r - 2 £ K„, r V <"> 


\r= 1 


r= 1 


( 21 ) 


subject to [W (n) V f ' !, r / |W ( " ) V (n) ] = I 4 . 


Following the Crank-Nicholson-like scheme fill , we can update the orthogonal matrices X„ = [ W (, '\ V (,,) | 
with XjX„ = I 4 using the following rules 


X„ - 2r[G/, X„] 


lx + T 


X,{G 




n W 


h 


-G T f G f -G T f X n v 


h 

-G T f X n 


( 22 ) 


where Gf = [g fw M,g f w,g f w,g f m] of size i?x4 are the first order derivatives of the function 
/(W , ' !) ,V" J) ) with respect to [W ( ' !) , V (n) ] 

df 


Sf,w? - 

= 


dw[ n) 

df 

dv 


( n ) 


= Qn,rW[ n) ~K nJ V { r n) , 

= -¥ n ^-K 


(23) 

(24) 


and r„ = X' n Gf and r > 0 is a step size chosen using the Barzilai-Borwein method ffl2l . Each iteration 
to update X„ = V (n) ] inverts a matrice of size 4x4. 

We finally derive update rules for all parameters. The proposed Alternating Subspace Update (ASU) 
algorithm is summarized in Algorithm Q] The algorithm alternating updates cr„ and [W™, V (n) ] for n = 
1,2,3. The entire factor matrices Y (,,) and core tensors S, TC are computed only once. 
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Algorithm 1: Alternating Subspace Update (ASU) 

Input: Data tensor U: (R x R x R) of rank R 

Output: A rank-(2,2,2) tensor [S; |U < ' I> )I and rank-(/? - 2 ,R - 2,R - 2) tensor |[3-C; {V*" 1 )]] 

begin 

t Initialise components U (n) and V (n) 

2 Orthogonal normalization to U (n) and V (,!| and compute cr„ = [cr„j, fr„ 2 J 7 and W (n) 

repeat 

for n = 1,2,3 do 
for r = 1,2 do 

3 |_ Update cr nj = -^=== where x is solved as in 1161 

4 Compute G f as in {23j and d T„ = X T n G f where X„ = [W <n) , V l,,) ] 

s Update X„ = [W (n) , V < " ) ] as in d 

6 |_ U' n) v- W ( "’ diag(f„) + V w diag(<r„) 

until a stopping criterion is met 

for n = 1,..., N do 

7 Select Vj"^_ 2 as an orthogonal complement of [W ( "\ V ( " ) ] 

8 Compute output S and TC as in <(9]) and <[8]» 


The most expensive step in the ASU algorithm is computation of the matrices O,, = Y (n) {®ktn Y (,n Y (k)T ^ Y 
A naive computation method might cost 0(R 4 ). We present a more efficient computation which requires 
a cost of order 0(R 3 ) 


O,, = Y ( „) ((I - W (,!2, W , ' ,2)7 ') ® (I - W f " l, W ( " l)r )) Yj n) 

= Y (n) Yf B) - Y in) (W^W^ T ® I)Yf B) 

-Y ( „) (I ® W ( " l) W ( " l)r )Y ( r „ ) + Y {n) (W M W (n2)T ® W ( " l) W ( " l)r )Y^ ) 

= Y (n) Y T (n) - (y X ni W (H1) , y X ni w M ) nun2 

-<y x „ 2 w ( " 2 \y x„ 2 w fe >> ni ,„ 2 - (y x„, x„ 2 w M ,y x ni w M x„ 2 w (n2) > ni ,„ 2 , 


where {n\ < nf\ = {1,2,3} \ {n}. 

The first term Y ( „) Y ( ^ ;) is computed only once. The mode-///, tensor productions y x„ k W < " t) yields a 
tensor comprising two slices of size RxR with a computation cost of 0(R } ). 


IV. Simulations 


Example 1 [Decomposition of small tensors admitting the CP model.] In this first example, we illustrate 
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the block deflation of tensor of size RxRxR and of rank R where R = 10,20,30. The weight coefficients 
A r were set to 1, whereas collinearity degrees between components a ( f ] and a ( ' l> for all r + s were 
identical to a specific value c, which was varied in the range [0, 0.9], a' r)1 a'" ) = c and a," ]1 'a" ] = 1 
for all n (see Appendix F in |(6]). We use the subroutine “gen_matrix" in the TENSORBOX llT3ll to 
generate factor matrices with specific correlation coefficients. 

We compare the ASU algorithm with the ALS algorithm Q for the multilinear rank-(L,., M r , N r ) block 
tensor decomposition with two blocks. For this problem, one can use the non-linear least squares (NLS) 
algorithm (7j]. However, as similar to the ALS algorithm 0, the NLS algorithm needs to estimate two core 
tensors and full factor matrices. Hence this algorithm is much more expensive than the ASU algorithm. 
Simulations were run on a Macbook-air laptop having 4 GB memory and a 1.8 GHz core i7. Due to 
space and time consuming, the ALS 0 was only ran in simulations for R = 10. 

The algorithms were initialised by the same values generated using the Direct Trilinear Decomposition 
(DTLD) Ifl4l . The algorithms ran until differences between consecutive approximation errors were small 
enough, |e* - Sk+ 1 | < HU 6 e*. where s = \\y - fy\\ 2 F , or when the number of iterations exceeded 1000. 
Rank-1 tensors were then obtained from decomposition of blocks of rank-2. Performances were assessed 
through the squared angular errors SAE in estimation of components a in> SAE = arccos . There 

were 100 independent runs for each rank R = 10,20 and 30. The Gaussian noise was added into the 
tensor with signal-noise-ratio SNR = 30 dB. 

Fig. [2] shows median SAE (MedSAE) in dB (-101og 10 S'A£') obtained by ASU and ALS (9 compared 
with the Cramer-Rao Induced bound (CRIB) lfT5l on the squared angular error. Algorithms succeeded in 
most cases, but failed only when c = 0.9. For such difficult scenarios, CRIB on SAE was about 17.8 dB, 
indicating median angular error of 7.4 degrees between the original and estimated components. We note 
that in practice, it is hard to estimate a component with CRIB less than 20 dB, i.e., angular error of 5.7 
degrees fl6l . 

In Fig. |WJ we compare execution times (in second) of algorithms for different ranks. Since the 
decomposition became more difficult when c was close to 1, running times of algorithms increased 
as shown in Fig. |TVl The ASU algorithm was on average 8 times faster than ALS 0 when R - 10. 

The results confirmed high speed and accuracy of the proposed ASU algorithm. 

Example 2 [Decomposition of large-scale tensors with high rank] This example illustrates an advantage 
of ASU over existing algorithms for the ordinary CPD in decomposition of large-scale tensors with 
relatively high rank R = 300 and 500. We generated rank-R synthetic tensors of size RxRxR as 
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c 


c 


Fig. 2. Comparison of median SAEs and execution times of the ASU and ALS algorithms m in decomposition of tensors of 
size Rx Rx R and rank R where R = 10, 20 and 30 for Example Q] 


in the previous example. Components a { , n) and a!] ) for r ± s have identical collinearity degrees, i.e., 
a ( , n>T a[ n) = c where c - 0.1,0.2,..., 0.6. The Gaussian noise was at SNR = 30 dB. Simulations were 
run on a computer consisted of Intel Xeon 2 processors clocked at 3.33 GHz, 64GB of main memory. 
Extraction of all components is expensive in both computation time and space. The main reason is that 
CP gradient computation is with a cost of 0{R 4 ) IfTTl . For such big tensors, sequential extraction of 
rank-1 tensors using the ASU algorithm is more efficient. The ASU algorithm is particularly suited to 
tracking a few components without estimation of the full CP model as other algorithms. In this example, 
ASU could extract components after, on average, only 3.8 seconds for R = 300, and 20 seconds when 
R = 500. Decomposition of the same tensors using the FastALS algorithm for CPD liTTl on average 
needed 538 and 3675 seconds, respectively. Comparison of execution times of ASU and FastAFS IfTTl 
is given in Table H) 

Example 3 [Comparison of rank-1 and block tensor deflations] 

This example presents a case when the block tensor deflation is more appropriate than the rank-1 tensor 
deflation. We considered tensors whose factor matrix A (1) comprised two highly collinear components. 
More specifically, we first generated rank-R synthetic tensors of size R x R x R where R = 10 as tensors 
in Example [T] i.e., a,"’ 1 'a,"' = 1 and afl 1 a = c for all r + s and 0 < c < 1. The component a'-, 11 was 
then adjusted so that its collinearity degree with a was of p = 0.98 

a!, l} := (p ~ ca)a\ l) + aa\ l) (25) 
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TABLE I 

Comparison of execution times of the ASU algorithm to extract two components from high rank-R tensors, and those of the 

CP-FastALS algorithm for Example [2] 


Execution time (second) 



c = 0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

R = 300 







ASU 

3.81 

3.66 

3.76 

3.82 

3.89 

3.77 

CP-FastALS 

530.6 

543.5 

537.6 

537.6 

541.9 

539.2 

R = 500 







ASU 

38.4 

16.7 

16.5 

16.9 

16.8 

17.1 

CP-FastALS 

3658 

3672 

3679 

3693 

3678 

3669 


where a = 1 - p 2 )/( 1 - c 2 ). Collinearity degrees between a\ l ’ and the other components a[ l ’ for r > 2 

were then given by 


a\ l)T a { P = c{p + a( 1 - c)). 


(26) 


Since a\ l> or a!, 1 * were highly collinear, extraction of only one rank-1 tensor associated with or a[ l) is 
difficult as analysed in Part 2 (6j. We will show that there are loss of accuracy in extraction of the rank-1 
tensor a^oo^oa®, compared with block tensor deflation which extracts two rank-1 tensors comprising 
components a\ ]) or a^\ For this comparison, we initialised the ASU algorithm (ASU-1) 0 for the rank-1 
tensor deflation and the ASU algorithm proposed in this paper (ASU-2) by the true components. The 
mean SAEs (dB) of estimated components achieved by the two algorithms shown in Fig. [3] indicate that 
the loss varied from 0.37 dB to 2.5 dB when c increased from 0.1 to 0.9. 

In another simulation with similar settings, we compared ASU-1 and ASU-2 when the factor matrices 
A (1) and A (2) comprised two highly collinear components a^ )T a { ^ ] = ci { ~ )T ci {2) = 0.95. It is necessary to 
remind conditions for the rank-1 tensor deflation, i.e, conditions for ASU-1. According to Lemma 2 in 
Part 1 0, a rank-1 tensor can only be uniquely extracted if at least two components do not lie within the 
column spaces of the other components. Since the two components a^' and were highly collinear 
with a'-! ' and a 2 \ respectively, the rank-1 tensors a\ 11 o a 2) o a f 1 and a!, 1 * ° a'-T* ° a?' can be considered 
to violate the condition. Extraction of one of the two rank-1 tensors is not stable. Instead, they should 


be extracted together. It is shown in Fig. |3(b)| that the loss of accuracy of ASU-1 was higher for this 
difficult decomposition. 
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C C 

(a) Highly collinearity in A 111 . (b) Highly collinearity in A' n and A’ 21 . 

Fig. 3. Comparison of mean SAEs (MSAE) achieved by the ASU algorithms for rank-1 tensor deflation |3| and block tensor 
deflation for Example [3] 


V. Conclusions 

We have introduced a rank-splitting scheme for CPD, and developed an ASU algorithm for rank- 
2 block deflation. The algorithm needs to estimate only 4 vectors and two scalars per dimension, 
and has a computational cost of 0(R 3 ) for a tensor of size R x R x R. The algorithm can be ex¬ 
tended to higher order tensors, and decomposition with additional constraints. Algorithms for the block 
tensor deflation are implemented in the Matlab package TENSORBOX which is available online at: 
http://www.bsp.brain.riken.jp/~phan/tensorbox.php 


Appendix A 
Proof of Lemma [I] 

Proof: Let Q„ and F„ be column space of U ( "\ and V ( "\ respectively, which can be obtained from 
QR decompositions 

U (n) = Q„ R„, V (n) =F„K„. 

Consider singular value decomposition (SVD) of Q „ 7 F„ = r„E„ T'[ where T n e R KxK , *P„ e lA xl7 '’ " A 1 
and = [diag{cr„},0 R _2/f], cr„ 6 Kf. Then, the new decomposition is equivalently defined through 

U ( "> = Q„ r„, n = l,...,N, (27) 

S = Sx, (r[ Q[u (1) ) • • • X/v ( r^Q^U w ) , (28) 
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and 

V (n) = F„ *F„, n=l,...,N, 

M = tKxi(Vf F[V (1) ) ■ • • x N (W T N F T N \ m ) . 
It can be verified that U® and V® are orthogonal and 

(U®) 7 V" 1 ) = T T n Ql F„ *¥„ = Z n . 

This completes the proof. 


(29) 

(30) 


(31) 


Appendix B 
Proof of Theorem [Tj 


Proof: For simplicity, we assume that B® and B (A,) are of full column rank. Since 

G ( „) (Ofat,, U®) 7 


Y ( n) = B„ diagf/f} | O B®] = [ u ® V ®] 


H (n) (OfetB V®) 


U (1) , V (1) and \J (N \ V tA0 are also full column rank matrices. 

Thanks to Lemma [T] we can assume, without any loss in generality, that the factor matrices U ( ' ,) and 
V® for n = 1 and n = N, obey the normalization condition, i.e., U® r U® = I K , V® 7 V® = I R _ K and 
U (n>r V® = [diag(cr„), 0 K x(r-2K)] where cr„ = [o- nJ , cr n2 ,.. ■, cr, hK \ T 6 R* and 0 < cr, hk < 1. 

Let Z N = [V®,..., z®*] be an I N x K matrix whose columns are defined as 


JN) _ 



- CT N,k V 


(N) 

k 


1 - CT 


2 

N,k 


k=l,...,K. 


(32) 


We have Z T N V (N) = 0 and Z^U W) = I K . Put W = Z^B ( ®, the tensor-matrix product y x N Z T N is given 
by 


Vx n Z t n = ; B®,..., B^ _1) , W«]|, (33) 

where R denotes set of indices of non-zero columns w k ± 0 for k e R, B® = B®(:, A) are sub matrices 
taken from B® and (i R = J3(R). 

From the block term decomposition of y, we also have 

yx N Z T N = IS; u (1) ,..., l K J , (34) 


which leads to 


S = Wk ; U® r B®,..., U®- 1)r B®- u , W R ]|. 


(35) 
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Hence, the expression in (1341) is equivalently rewritten as 

y XyvZ^ = I^;U (1) U (1)r Bj ) ,...,U ( ^- 1) U w - 1)T B^- 1) ,W R I (36) 

Since B^ 1 ’ is a full-column rank matrix, the CPDs in (l33l) and (l36l) are unique and therefore identical. It 
follows that 

(Ijc - U (n) U (n)r ) Bg? = 0, n = l, ... ,N - l. (37) 

That is B^ 1 are spanned by U 1 -" 1 for n = 1,.. . ,N - 1, respectively. In addition, since S has multilinear 
rank-(7f,..., K), from ( 1351 ). B,^ 1 must be of size I\ x K, and can be expressed as 

= U (1) Qi (38) 

where Q! is a full-column rank matrix of size K x K. Implying that S is a rank-A" tensor, and uniquely 
identified 


S = Wk ; Qi, U (2)r b®, ..., uw B^- 1 ', W«J. 

Similarly we can prove that 

y x t Z[ = lfi x ; U (1) Z[ B$|\ U (2) U (2)r B| 2) ,..., U (A0 U fA,)7 B^° || 

d 

S = ^ v ;Z[B” ) ,...,U (A '- 1)r B^" 1) ,U wr B^ ) ]|, 


(39) 


(40) 


where X is an index set of K non-zero columns Z[B (1) . 

Since the first and the last factor matrices in the CP decompositions of G in (l39l) and in (l40l) are of 
full column rank, the decompositions are unique. Therefore, the two sets % and X are identical, and the 
tensor |[S; U ( ..., U (,V) |] is a rank-A" tensor taken from K rank-1 tensors of the tensor y, 

IS;U ( 1 ) ,...,u (JV - 1 ) ,u w ] = BjS«;B|| ) ,...,Bf 1) ,Bf]. (41) 

Finally, it is obvious that eliminating the rank-A" tensor |[S;U (1 \..., U f,V) ]| from y remains a rank-(A- K) 
tensor, i.e. 3T is a rank-(A - K) tensor. 
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